Learning Feature-Value Grammars from Plain Text

نویسنده

  • Tony C. Smith
چکیده

This paper outlines preliminary work aimed at learning Feature-Value Grammars from plain text. Common suffixes are gleaned from a word suffix tree and used to form a first approximation of how regular inflection is marked. Words are generalised according to these suffixes and then subjected to trigram analysis in an attempt to identify agreement dependencies. They are subsequently labeled with a feature whose value is given by the common suffix. A means for converting the feature dependencies into a unification grammar is described wherein feature structures are projected on to unlabeled words. Irregularly inflected words are subsumed into common categories through the process of unification.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Joint Bayesian Morphology learning for Dravidian languages

In this paper a methodology for learning the complex agglutinative morphology of some Indian languages using Adaptor Grammars and morphology rules is presented. Adaptor grammars are a compositional Bayesian framework for grammatical inference, where we define a morphological grammar for agglutinative languages and morphological boundaries are inferred from a plain text corpus. Once morphologica...

متن کامل

Machine Learning Comprehension Grammars for Ten Languages

Comprehension grammars for a sample of ten languages (English, Dutch, German, French, Spanish, Catalan, Russian, Chinese, Korean, and Japanese) were derived by machine learning from corpora of about 400 sentences. Key concepts in our learning theory are: probabilistic association of words and meanings, grammatical and semantical form generalization, grammar computations, congruence of meaning, ...

متن کامل

Inductive learning of lexical semantics with typed unification grammars

In the last decade machine learning techniques based on logic such as Inductive Logic Programming (ILP) have started being used in learning grammars from corpora. While the first approaches were based on the translation of grammar into first-order predicate logic, an attempt has been made recently to adapt the ILP learning schema to the feature constraint logic of typed-unification grammars. In...

متن کامل

Learning Probabilistic Dependency Grammars from Labeled Text

We present the results of experimenting with schemes for learning probabilistic dependency grammars1 for English from corpora labelled with part-of-speech information. We intend our system to produce widecoverage grammars which have some resemblance to the standard 2 context-free grammars of English which grammarians and linguists commonly exhibit as exampies.

متن کامل

Prospects of encoding Java source code in XML

Currently, the only standard format for representing Java source code is plain text-based. This paper explores the prospects of using Extensible Markup Language (XML) for this purpose. XML enables the leverage of tools and standards more powerful than those available for plain-text formats, while retaining broad accessibility. The paper outlines the potential benefits of future XML grammars tha...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998